An Approach to Take Multi-Word Expressions
نویسندگان
چکیده
This research discusses preliminary efforts to expand the coverage of the PropBank lexicon to multi-word and idiomatic expressions, such as take one for the team. Given overwhelming numbers of such expressions, an efficient way for increasing coverage is needed. This research discusses an approach to adding multiword expressions to the PropBank lexicon in an effective yet semantically rich fashion. The pilot discussed here uses double annotation of take multi-word expressions, where annotations provide information on the best strategy for adding the multi-word expression to the lexicon. This work represents an important step for enriching the semantic information included in the PropBank corpus, which is a valuable and comprehensive resource for the field of Natural Language Processing.
منابع مشابه
Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملCorpus-Driven Study of Multi-Word Expressions Based on Collocations from a Very Large Corpus
We present a corpus-driven approach to the study of multi-word expressions, which constitute a significant part of. As a data basis, we use collocation profiles computed from DeReKo (Deutsches Referenzkorpus), the largest available collection of written German which has approximately two billion word tokens and is located at the Institute for the German Language (IDS). We employ a strongly usag...
متن کاملIdentifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy
Multi-word expressions constitute a significant portion of the lexicon of every natural language, and handling them correctly is mandatory for various NLP applications. Yet such entities are notoriously hard to define, and are consequently missing from standard lexicons and dictionaries. Multi-word expressions exhibit idiosyncratic behavior on various levels: orthographic, morphological, syntac...
متن کاملAn Efficient, Generic Approach to Extracting Multi-Word Expressions from Dependency Trees
The Varro toolkit offers an intuitive mechanism for extracting syntactically motivated multi-word expressions (MWEs) from dependency treebanks by looking for recurring connected subtrees instead of subsequences in strings. This approach can find MWEs that are in varying orders and have words inserted into their components. This paper also proposes description length gain as a statistical correl...
متن کاملMulti-Word Verbs In A Flective Language: The Case Of Estonian
This paper describes automatic treatment of multi-word expressions in a morphologically complex flective language – Estonian. It focuses on a special type of multi-word expressions – the verbal multi-word expressions that can function as predicates. Authors describe two language resources – a database of verbal multi-word expressions and a corpus where these items have been annotated manually. ...
متن کامل